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Introduction. 


In  the  proposal,  our  goal  was  to  study  rank  statistics  to 
test  the  hypothesis  HQ:  F°(x)sG°(x)  for  all  x,  where  F°  and  G° 
denote  the  distribution  functions  of  random  variables  X°  and  Y° 
respectively. 

Let  and  Y°,...,Y°  be  the  two  independent  samples 

from  F°  and  G°  respectively,  and  N=m+n.  Sample  estimates  of 
F°(x)  and  G°(x)  are  given  by  F^(x)=(#  of  X°<x)/m  and 

G^(x)s(#  of  Yj<x)/n  respectively,  and  to  test  Hq  the  locally  most 

powerful  linear  rank  statistic  is  given  by 

TN  s  5)  J(HS(x))dFN  ‘ 

In  the  above  statistic,  =  jj  FjJ(x)  +  R  Gjj(x)-  The  range  of 

integration  is  (-•»,«»)  here  and,  unless  otherwise  specified,  in 

the  remainder  of  the  report. 

When  X  and  Y  observations  are  arbitrarly  right  censored,  a 
better  estimator  of  F(x)  is  given  by  the  Kaplan-Meier  product 
limit  estimator.  Thus,  in  the  case  of  arbitrarly  right  censoring 
situation,  it  was  proposed  to  study 

TN  s  ^  J<Hj(x))dFj(x)  , 

•  • 

where  Fg  and  denote  the  product  limit  estimator  of  F  and  H 
respectively.  Heuristic  justif i  demons  for  this-stmtistio  are 


motict. 
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that  (i)  it  is  a  natural  generalization  of  TN,  and  (ii)  a 
generalization  of  the  Wilcoxon  statistic,  as  obtained  by  Prentice 
(1978)  using  local  optimum  properties,  contains  product  limit 

estimators.  We  had  proposed  to  study  asymptotic  as  well  as  small 

* 

samples  properties  of  T^. 

An  important  special  case  of  T^,  obtained  when  J(x)=x,  is 
known  as  the  Wilcoxon  statistic.  If  J(x)=x, 

TM  *  {  {5  Fj(x)  ♦  2  G?(x) }dF?(x)  s  2±1  ♦  2  (GjWdFjU)  . 
njmn  hk  n  mN  N  3  N  H 

Since  (n+DtmM)’1  is  a  constant,  TN  is  equivalent  to 

Sjl  s  ^  Gj(x)dFj(x). 

The  statistic  Sg  is  attractive  for  yet  another  reason  that  it 
estimates  Pr[Y°<X0].  In  the  arbitrary  right  censoring  situation, 
it  would  be  appropriate  to  replace  F^  and  G^  by  the  Kaplan-Meier 
estimators  f||  and  g|J  and  therefore  study 

sj  *  ^  Gj(x)dFj(x) . 

It  turns  out  that  Efron  (1965)  had  obtained  as  an  extention  of 
Gilbert  (1962)  and  Gehan  (1965)  statistic. 


In  the  first  part  of  this  report,  Efron'sstatistic  is 
compared  with  Prentice* statistic .  In  the  second  part,  we 
consider  some  elementary  properties  of  the  proposed  statistic, 
and  in  the  third  part,  its  small  sample  behavior  is  reported.  It 
is  observed  that,  unlike  HN(x),  H^(x)  can  not  be  expressed  as 

H*(x)  =  5  f!(x)  ♦  2  G*(x). 

N  N  N  N  N 

Therefore,  our  statistic  differs  from  Efron  statistic.  This 
aspect  is  elaborated  in  section  2  with  its  consequences. 

Results  of  sections  2  and  3  are  preliminary.  Research  is 
still  continuing  on  some  aspects  of  the  proposed  statistic.  A 
full  report  will  be  published  at  the  termination  of  such 
evaluations. 

1 .  Efron  and  Prentice  Statistics. 

Let  X°  and  Y°  be  two  independent  random  variables  with 
distribution  functions  (df)  F°  and  G°  respectively.  Let  U  and  V 
be  two  other  random  variables  which  are  independent  of  each 
other,  independent  of  X°  and  Y°,  and  with  corresponding  df  1^  and 
Iy  respectively.  In  the  arbitraryly  right  censored  data,  the 
observables  are  given  in  terms  of 


EtAl 


GU!l 


sd 


•{ 


1  If  X°<Ut  , 


0  otherwise  , 


where  Likewise,  Y j=min(Y ° , Vj)  and  £j=1  CO)  if  Y°<Vj 

(Yj>Vj),  are  observables  from  the  other  sample. 

To  compare  F°  and  G°,  Gehan  (1965)  and  Gilbert  (1962) 
independently  proposed  the  following  modification  of  the  Wilcoxon 
statistic.  They  suggested  the  use  of  2LZL  WQ(i,j)  where  the 
"score  function"  WG(i,j)  is  defined  as  follows: 


'  1  if  x^>yj,  when  yj  is  an  uncensored  observation, 

WG(i,j)  s  0  if  x1<yJ,  when  Xj  is  an  uncensored  observation, 

-  1/2  if  x^  and  yj  are  both  censored. 

Thus,  in  Gehan  statistic,  a  pair  (x^yj)  contributes  1/2  whenever 
x°  and  y^  "can  not  be  compared".  Efron  (1965)  proposed  a 
modification  to  the  above  scoring  function.  He  argued  that  the 
score  function  should  be  an  estimate  of  P[X°>Y°]  and  the  estimate 
should  be  obtained  conditional  upon  the  available  sample.  This 
method  of  evaluating  the  score  function  has  been  shown  to  provide 
locally  most  powerful  rank  test  under  type  II  censoring  by 
Bhattacharyya  and  Mehrotra  (1983).  Though  the  justification  was 
obtained  only  recently,  other  occurences  of  this  conditional 
argument  have  appeared  in  earlier  works. 


We  consider  an  example  of  evaluation  of  Efron  score.  If  x^ 
is  a  failure,  yj  is  a  censored  observation,  and  y^x^,  then  the 
probability  of  the  event  {X°>Y°}  is  given  by 

{G°(xi)-G0(yj) }/{ 1-G°(yj) } .  Clearly,  the  score  associated  with 
such  a  pair  is 


WE(i,j) 


'-w 


» 


where  G^  is  the  product  limit  estimator  of  the  distribution 
function  G°.  As  in  Gehan's  case,  Wg(i,j)=1  if  x^>yj  and  Yj  is 
uncensored,  0  if  and  x^  is  uncensored.  The  two  scoring 

functions  differ  only  when  the  pair  Cx^.y^)  can  not  be  compared. 
Efron  has  shown  that  his  statistic,  Tg  *?JV  i,j)  can  be 
represented  as 


TE  s  3  GN^x^dFN^x^  » 

•  • 

where,  as  mentioned  earlier,  G^  and  are  product  limit 
estimators  of  G°  and  F°. 

To  obtain  the  asymptotic  normality,  one  can  apply  the  theory 
of  U-statistic  to  the  earlier  form  or  Chernoff  and  Savage  theorem 
to  the  integral  representation  of  Tg.  Briefly,  the  following 


result  is  obtained. 


Theorem  (Efron):  Let  m,n  converge  to  infinity  in  such  a  way 


that  lim  m/NsA,  0<A<1.  Then,  the  distribution  of 


N1/2{Te-  ^  0-F°)dG°} 


Ov  j*Oi 


converges  to  a  normal  distribution  with  mean  zero  and  variance 


cr  ,  which  under  the  null  hypothesis  becomes 


r»  zJdz 


i  zJdz 


2  _  i  [  i  \  1  C  ZZ.  ] 

r°  "  *  oF{F°“1(z)}  1-A  ^oG{F0”^  ( z) } 


Here  l-FsO-F^O-Iy)  and  1-Gs(1-F°)(1-I„). 


From  the  above  theorem,  efficacy  of  Efron  statistic  is 


o»^o,2 


{  $(1-F°)dG°} 


(1.1) 


It  should  be  remenbered  that  the  product  limit  estimator  of 
the  survival  function  F°=(1-F°)  is  given  by 


1  if  x<x,,s, 


k(1> 


P(x)  = 


IM 


TI(-2li-)  1  if  x,„*<x<x 


'  ra-i-t-1 


(k)-  (k+1 ) 


,  ks  1 , . . .  ,m , 


where  x^m+1jseo.  Efron  uses  a  slightly  different  estimator  F^  of 


F  .  His  estimator,  which  is  self  consistent  i.e. 


7 


mF*( s) 


(#  of  x^s) 


+  Z  <1-0 

*i<* 


F*(  s) 
F*(xi) 


is  identical  to  P(x)  in  all  intervals  x(i{)£x<x(k+1 )  excePfc  when 

# 

ksm.  In  this  last  interval  F  (s)=0;  i.e.  it  is  assumed  that 


x(m) 


is  uncensored, 


irrespective  of  its  actual  value. 


Using  a  conditional  locally  most  powerful  criterion, 

Prentice  (1978)  obtained  a  rank  statistic  Tp  for  testing  Hq.  His 
statistic  is  given  by 


*  2  citz(n*ci 


V 


where  the  outer  sum  is  over  all  failures,  and  the  inner  sum  is 
over  all  those  observations  which  are  censored  between  the  ith 
and  the  (i+1)th  failures;  zu>  '  and  likewise  Zjj)  takes  value  1 
if  the  ith  failure  (jth  censored)  is  an  X  observation,  and  value 
0  otherwise.  The  score  ct  (C^)  corresponds  to  a  failure 
(censored)  observation.  From  the  equivalence  established  in 
Mehrotra,  Michalek,  and  Mihalko  (1983),  Prentice  statistic  can  be 
written  in  a  form  whose  asymptotic  normality  has  been  established 
by  Shoenfeld  (1982).  Briefly,  the  efficacy  of  the  Prentice 
statistic  is  given  by 


(5g(t)log{r2(t)/r1(t)}TT(t){1-Tr(t))Y(t)dt]2 
~ ~~  5  82<t)T!(t){1-TT(t)}V(t)dt 


(1.2) 


where 

g(t)slim(Cj-C.)k  r1(t)  =  f°(t)/{1-F°(t)},  r2(t)  =  g°(t)/{ 1-F°(t) ) , 


-*  v  v  *>  > 


TT(t)=(1-;0{1-G(t)}/[A{1-F(t)}+(1-»{1-G(t)}],  and 
V(t)  =  f°(t){1-Iu(t)}+(1-X)g°(t){1-Iv(t)} . 

The  ratio  of  (1.1)  and  (1.2)  gives  the  relative  efficiency 
of  Efron  statistic  versus  Prentice  statistic. 


2.  Properties  of  the  Proposed  Statistic. 

The  combined  ranked  sample  of  X's  and  Y's  can  be  written  in 
terms  of  three  vectors  W,£,  and  Z,  each  with  dimension  N.  The 
vector  Ws(W1 t . . . ,WN)  represents  the  combined  vector  of  ordered 
observations  W.$...&W||.  A  is  a  vector  of  indicator  variables, 
where  A  *  takes  value  1  if  is  a  failure,  and  0  otherwise.  1  is 
another  vector  of  indicator  variables,  with  Z ^  taking  value  1  if 
Ui  is  an  X  observation  and  0  otherwise.  Using  these  notations, 
the  product  limit  estimator  of  H°(x)  =  mN"1F0-*-nM”1G°(x)  at  w^  is 
given  by 


B*(W.)  =  Tr(-!“->Ai  .  B*(w,  . 

M  J  i.1  N-i+1  N  J’1  N- j+1 


The  second  factor  on  the  right  hand  side  of  the  above  expression, 
can  be  expressed  in  terms  of  X  and  Y  failures  as 


H-j+1  ~  (m-J  Zu)  +  {n-f  (1-Z„)} 


(2.1) 


0  0 

8y  convention,  Z  Z.s  2l(1-Z,_)sO.  On  the  other  hand,  the  product 

k«i K  k«i  K 

’ imit  e'  imators  of  the  survival  functions  F°  and  G°  are  given  by 


.  .  .TV-.  -  .V 
•-•I-AV.V-V.V.S'CV.V 


m 
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F*(w , )  s  F*(  w .  .)  , 

J  j  -*zk 


and 


k=i 


5  (w.)  =  s  (w.  .)  t— $ — ±-r J 

J  n-£(1-Z„) 

K=t 


(2.2) 


(2.3) 


From  equations  (2.1),  (2.2),  and  (2.3),  it  is  clear  that,  in  the 
case  of  arbitrary  right  censoring, 


B*  ^  !  F*(x)  .  2  G*(x)  . 
*  H  "  N  " 


As  a  consequence,  the  proposed  statistic  and  Efron  statistic 
differ  from  each  other  when  J(x)=x.  In  other  words,  the  proposed 
statistic  is  another  generalization  of  the  Wilcoxon  statistic. 
Another  interesting  consequence  of  this  observation  is  that,  for 
J(x)=x,  the  statistic  does  not  estimate  { constant+Pr [ Y°<X°] } . 
This  appears  to  be  a  drawback  of  the  proposed  statistic. 

However,  weights  should  be  assigned  according  to  the  ranks  in  the 
combined  sample.  This  justifies  the  usefulness  of  the  proposed 
statistic,  and  consequently  requires  further  investigation. 


The  mass  function  associated  with  the  product  limit 
estimator  can  be  shown  to  be 


{m- 


( 1  -  A. )  Z.  . 

— I — - — - )  ” 

r<  wt) 


» 


if  w^  is  an  X  observation  and  is  uncensored.  Hence, 
alternatively, 


whenever  Zk=1  andA^ri.  The  proposed  statistic  TN  can  be 
expressed  as 


Z  ( 1  -  ) 

r*  =  m'1  Z  J[H*(w.  )]{1  +  -i, - —  fj(w.)}  . 

N  P*(w1)  N  k 


After  changing  the  order  of  summation,  the  second  term  gives  an 

* 

interesting  interpretation  of  T^.  Essentially,  it  amounts  to 
assigning  a  contribution  H^(w^)  at  each  X  failure.  From  each 
censored  observation  that  falls  between  two  failures,  the 
contribution  is  the  weighted  sum  of  the  mass  function  f^(Wj); 
the  weights  are  proportional  to  H^(Wj)  and  the  summation  is  over 
all  future  X-censored  observations.  Prentice  statistic  is 
similar  in  nature  and  differs  in  the  scores  associated  with  the 
censored  observations. 


To  obtain  the  asymptotic  normality  of  the  statistic,  we 

« 

consider  the  following  representation  of  TM: 


TN  s  S  J CH*(x) ]dF* (x)  =  ^  J[H°(x)]dF°(x)  ♦  $ J [hJ( x)-H°( x) ]dF°( x) 


♦  5 J CH°<x)]dCF*<x)-F°Cx)3  +$  {J[HN(x)]-J[H°(x)] }d[Fj(x)-F°(x)] 


The  first  term  of  the  above  expression  is  a  constant,  and  under 
some  regularity  conditions  on  the  behavior  of  J(x),  the  last  term 
is  asymptotically  negligible.  On  the  middle  termg,  one  can  apply 


I 


■i 


*> 


«r 

.* 

s 


11 


1 /2  **  o 

the  property  that  N  {H  (s)-H  (s)},  considered  as  a  stochastic 
process  in  s,  approaches  a  normal  process  with  zero  mean  and 
covariance  kernel 


H°(s)H°(t)  S 


dH°( s) 


*{  1  -H°(  z)  }  { 1  — H  ( z) } 


As  a  consequence,  the  two  middle  terms  are  asymptotically 
normally  distributed  with  zero  mean  and  appropriately  obtained 
variance. 


These  and  other  related  details  are  still  under  further 
investigations. 

3.  Small  Sample  Behavior  of  T*. 

At  the  present  time,  we  have  investigated  the  behavior  of  T|J 
for  J(x)sx  which  provides  a  generalized  Wilcoxon  statistic.  We 
have  compared  this  statistic  with  Prentice  statistic  which,  in 
this  section,  will  be  denote  by  Tp.  This  comparison  is  made  when 
the  X  and  Y  observations  are  generated  from  logistic  populations. 
Prentice  statistic  is  obtained  with  the  appropriate  "logistic" 
weights.  Censoring  varies  over  0%,  10*,  30*,  and  50*.  The  zero 
percent  censoring  is  used  to  check  the  accuracy  of  the  simulation 
results.  Clearly,  in  this  case,  T^  and  Tp  are  essentially  the 
same,  and  both  equivalent  to  the  Wilcoxon  statistic.  In  the 
alternative  situation,  the  censored  Y  observations  are  generated 
by  varying  the  location  parameter ,  ,  of  the  logistic 


distribution  from  0.1  to  0.9.  In  every  case,  the  censoring 
distribution  is  a  uniform  whose  range  is  chosen  so  that  the 
desired  censoring  probability  is  attained.  The  sample  sizes  of 
the  populations  X  and  Y  are  both  kept  equal  to  10.  The  power  is 
obtained  from  1000  repetitions.  Table  1  shows  lOOOxpower  of  the 
statistics  T^  and  Tp. 


TABLE  1 ;  Simulated  Powers  of  the 
Prentice  and  Proposed  Statistics. 


Arab,  of 
Censor. 

0.0 

0.1 

0.3 

0.5 

P 

tp 

—  * 

tn 

tp 

T  * 

tp 

t; 

tp 

0.0 

50 

50 

50 

50 

50 

50 

50 

50 

0.  1 

56 

56 

60 

65 

47 

57 

55 

76 

0.2 

59 

59 

81 

82 

55 

69 

65 

80 

0.3 

84 

84 

96 

106 

65 

78 

71 

79 

0.4 

109 

109 

115 

121 

99 

119 

107 

113 

0.5 

148 

148 

151 

165 

136 

134 

133 

151 

0.6 

167 

167 

179 

195 

147 

171 

147 

156 

0.7 

196 

196 

185 

190 

175 

173 

160 

146 

0.8 

225 

225 

236 

230 

189 

188 

192 

179 

0.9 

292 

292  j 

285 

296 

225 

227 

180 

190 

Our  simulation  results  show  that  the  power  of  the  proposed 
statistic  Tj  is  generally  larger  than  the  power  of  Tpt  though  the 
difference  is  relatively  small.  This  leads  us  to  believe  that  tJ 
will  continue  to  perform  at  least  as  well  as  Prentice  statistic, 
even  if  the  X's  and  Y's  are  generated  from  other  distributions. 

Of  course,  the  J  function  in  T^  and  the  scores  in  the  Prentice 
statistic  must  be  chosen  appropriately. 


A  simulation 
and  sample  sizes, 
this  large  study, 


study,  covering  a  wider  range  of  distributions 
is  in  progress.  A  technical  report,  based  on 
is  expected  in  the  near  future. 
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